DNA sequence assembly and multiple sequence alignment by an Eulerian path approach.

نویسندگان

  • Y Zhang
  • M S Waterman
چکیده

We describe an Eulerian path approach to the DNA fragment assembly that was originated by Idury and Waterman 1995, and then advanced by Pevzner et al. 2001b. This combinatorial approach bypasses the traditional “overlap-layout-consensus” approach and successfully resolved some of the troublesome repeats in practical assembly projects. The assembly results by the Eulerian path approach are accurate, and its computation is significantly more efficient than other assembly programs. As an extension, we use the Eulerian path idea to address the multiple sequence alignment problem. In particular, we have as a goal aligning thousands of sequences simultaneously, which is computationally exorbitant for all existing alignment algorithms. As a beginning, we focus on DNA sequence alignment. Our method can align hundreds of DNA sequences within minutes with high accuracy, and its computational time is linear to the number of sequences. We demonstrate its performance by alignments of simulated sequences and by an application in a resequencing project of Arabidopsis thaliana. Although having some weaknesses including aligning gap-rich regions, the Eulerian path approach is distinguished from other existing algorithms in solving either fragment assembly or multiple alignment

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Eulerian Path Approach to Global Multiple Alignment for DNA Sequences

With the rapid increase in the dataset of genome sequences, the multiple sequence alignment problem is increasingly important and frequently involves the alignment of a large number of sequences. Many heuristic algorithms have been proposed to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally different from all currently available...

متن کامل

Eulerian Path Methods for Multiple Sequence Alignment

With the rapid increase in the size of genome sequence databases, the multiple sequence alignment problem is increasingly important and often requires the alignment of a large number of sequences. Beginning in 1975, many heuristic algorithms have been created to improve the speed of computation and the quality of alignment. We introduce a novel approach that is fundamentally distinct from all c...

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

An Eulerian path approach to local multiple alignment for DNA sequences.

Expensive computation in handling a large number of sequences limits the application of local multiple sequence alignment. We present an Eulerian path approach to local multiple alignment for DNA sequences. The computational time and memory usage of this approach is approximately linear to the total size of sequences analyzed; hence, it can handle thousands of sequences or millions of letters s...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Cold Spring Harbor symposia on quantitative biology

دوره 68  شماره 

صفحات  -

تاریخ انتشار 2003